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LEAST SQUARES AND PSEUDOINVERSION 


Eugene J. Lefferts 

Mission and Systems Analysis Branch 
Mission and Trajectory Analysis Division 


SUMMARY 

9 

The purpose of post flight orbit and trajectory analysis is to establish those 
parameters and procedures which contribute to the uncertainty in the computed 
orbit. The two main techniques for post flight analyses are differential correc- 
tion procedures and regression analyses. 

The differential correction procedure is a method by which an orbit is cor- 
rected by interatively performing a least squares fit to the differences between 
computed and measured observations. The actual measured quantities must be 
corrected as a result of station location uncertainties, bias in the data, and so on. 

Regression analysis is performed upon the residuals from the differential 
correction procedure. Models of station and measurement errors are obtained 
by means of a least squares fit to these residuals. 

Inherent in both of these procedures is the need for inverting the matrix 
associated with the least squares fit. In most applications these matrices tend 
to be poorly conditioned and the resulting inversion process becomes prone to 
computational errors. Thus an investigation of these errors is a prerequisite 
to successful post flight analyses. 

In order to determine the extent of errors introduced by computational pro- 
cedures, various inversion and generalized inversion (pseudoinversion) methods 
were applied in the determination of a least square polynomial fit to data gener- 
ated by a polynomial. The use of the generalized inverse and some of its com- 
putational advantages are demonstrated. In particular it is shown that if the 
formation of the normal equations is avoided one can obtain equivalent accuracy 
using single precision arithmetic as one would obtain using double precision 
arithmetic with conventional routines. An added advantage of the generalized 
inversion routines over conventional techniques is that one always receives a 
positive indication of when computational difficulties are encountered. Using 
conventional techniques the identification of polynomial fits of order higher than 
seven are doubtful for single precision arithmetic. This order can be extended 
to eleven with double precision methods. In contrast generalized inversions 
routines give meaningful results for orders as high as the twentieth. 
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LEAST SQUARES AND PSEUDOINVERSION 


INTRODUCTION 

Inherent in most orbit determination and regression analyses is the need for 
inverting the matrix that is associated with the normal equations. In most appli- 
cations, such matrices tend to be poorly conditioned, and the inversion process 
becomes computationally difficult to perform with accuracy in limited precision 
computers. The source of this difficulty and some of its effects are discussed 
in this paper. 


MATHEMATICAL MODEL 

The mathematical model for the problems to be discussed is given by the 
linear system 

•? 

AX = Y, (1) j 


where A is an n x m matrix, Y is an n vector, and X is an m vector. In general, 
the system is overdefined; that is, n is much greater than m. It is desired to ob- 
tain a solution, X - X 0 , such that the scalar function 


G ( X) = (AX - Y) t W(AX - Y) (2) 

is minimized over all choices of X. In this equation, W is an arbitrarily preas- 
signed m x m positive definite matrix of weights. By means of the calculus, a 
necessary condition for an extremem of G (X) is obtained by requiring the vanishing 
of the gradient of G with respect to X. Thus: 


V- G(X) ~ 2 A T W(AX - Y) = 0. 


(3) 


This gives rise to the normal equations 

A T WAX = A T WY , 


(4) 
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with a solution given by 


X 0 ~ (A T WA)"* 1 A T WY, (5) 

providing that the indicated inversion can be performed. 

In most practical applications, the vector Y represents a physical measure- 
ment, / 3 , corrupted by noise, 77. This noise is generally postulated to be charac- 
terized by the statistical properties 


v = p+v, 


E{Y> = p, 

(6) 

E{(Y - p) (Y - /3) T } = E{^ t ) = Q, 

( 7 ) 


where Q is an m x m positive definite matrix, called the covariance matrix, as- 
sociated with observations. The covariance matrix associated with the solution 
X 0 is given by P , where 

E{X 0 > = |e (A t WA)" 1 A t W(/3 + 7?) j 
“ (A T WA)” 1 A T W E{/3 + 77} 

= (A T WA)" 1 A T W^ 

and 

P = e|(X 0 - E{x 0 }) (X 0 - E{X 0 }) T | 

= E |(A t WA)' 1 A T Wtj t) t WA(A t WA)" 1 } 
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(A t WA)-‘ A t WE{^t 5 t > WA(A t WA)-‘ 


3 (A T WA)“ l A T WQWA(A T \VA) -1 . 


Two particular cases in practice are considered. In one case, W is assumed to 
be the identity matrix, and the solution and covariance matrix are given by 

x 0 = (A T Ay l A' r Y t (9) 


P ~ (A T A)~ l A T QA(A T A)” t . 


( 10 ) 


This is the so called least squares solution. The second case is when W « Q" 1 . 
In this case, the solution and covariance matrix are defined by 

X 0 s (A T Q' 1 A)” 1 A T Q” 1 Y , ( U) 


P = (A t Q~* A)” 1 . 


( 12 ) 


PROBLEM STATEMENT 

It would appear that the problem is completely solved and that nothing further 
need be considered. Unfortunately, this is not the case. In most practical appli- 
cations, implementation of Equations (9) through (12) may lead to drastically poor 
results. 

As an example, consider the set of data y, , x. that is generated by the 
quadratic 1 


y = 40 x + 10 x 2 . 


(13) 


We will assume that y is to be fitted by the polynomial 

y - a Q + a j x + . . . . + a n x n , 


1 Rice, W. and Lefferts, E., "Interference Reduction Techniques for Nonlinear Devices," Quarterly 
Progress Report No. I, Contract No. DA-36-039-AMC-02208 CEI, ER-13171, Martin-Marietta Corp., 
Baltimore, Maryland. 
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where the coefficients are obtained by a least squares fit to noiseless data, 

Using a single precision routine (SHARE 7,0.027), we obtain the results that are 
listed in Table 1, 

As shown in the table, even for this simple example, complete recovery of 
the coefficients is steadily worsened as the number of terms is increased. This 
poor recovery for such small dimension matrices may be surprising, and one 
may be tempted to attribute the results to the inadequacy of the program em« 
ployed. However, this is not the ease: poor recovery is characteristic of most 
least squares programs. This is amply demonstrated in Table 2, where three 
such programs are compared for a seventh degree fit to the above quadratic, 
Equation (13). 

The difficulties encountered in the above example could be eliminated by 
employing a higher degree of precision in the computer. But the use of double 
precision is not a panacea. Besides giving false confidence in the computer re- 
sults, double precision only displaces the problem since it reappears as the 
dimension of the vector to be recovered increases, In double precision using 
conventional inversion routines, this limit is reached at about dimension fourteen. 


MATRIX CONDITIONING 

The main difficulty encountered in the solution of the normal equations for a 
least squares fit lies in the poor conditioning of the associated matrix to be in- 
verted. The concept of a poorly conditioned matrix is one that requires some 
clarification. For nonsingular matrices, such a concept is developed by 
Forsythe 2 . 

The norm of a vector, Y , is given by 

| |y| | = (Y T Y)# = (Y 2 + Y 2 * . , + Y n 2 )' A . 

For any square matrix, A, the spectral norm | |A| | is defined by 

1 |A| ! = i |XM=i 1 |aX| 1 • 


2 Forsythe, George E., "Today’s Computational Methods of Linear Algebra” Siam Review, July 
1967, Vol. 9, No. 3, pp 489*515. 
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Table 2 

Comparison of Seventh Order Fits to a Quadratic 


Coefficient 

True 

Value 

Recovery 
by SHARE 
Program 
7.0.027 

Recovery 
by SHARE 
Program 
7.0.002 

Recovery 
by SHARE 
Program 
9.4.015 

a 0 

0 

3.75 x 10~ 4 

2.98 x 10- 3 

1.60 X 10- 2 

a i 

40 

40.02 

39.85 

+39.16 

a 2 

10 

9.75 

12.053 

+22.10 

a 3 

0 

11.55 

-11.37 

-36.00 

a 4 

0 

-4.24 

30.99 

+78.00 

a 5 

0 

6.68 

-44.20 

-11.00 

a 6 

0 

-4.38 

31.60 

-42.00 

Ey 

0 

1.25 

-8.94 

26.00 


For a nonsingular matrix, A, the condition of A is defined by 

Gond. A = I |a| I • I I A" 1 1 I . 


One of the main results in using the concept of the condition of a matrix is that 
it can be applied to reflect the variations in the solution that are due to pertur- 
bations either in the matrix A or its nonhomogeneous side. Let a system be 
given by 


If is perturbed to B + SB, then X is perturbed to X + SX. The relative change 
in X is given by 
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|8X| 

|x| 


< cond . 


1 5B| 

I |b| | 


If A is subjected to a change, 8 A, then we have 



sx 



— * **— * 
x + bx 


< cond. 


1 1 8 A | ] 

I I A | | 


Unfortunately, the condition of a matrix is not easily determined, and one must 
rely on other procedures to determine how good an inverse actually is. Such a 
procedure has been given by D. Morrison 3 . His quick version, which may be 
computed by hand, is given as follows. Let A = (a. .) and A" -1 = (a ij ). Let the 
exponents of the diagonal terms of A be E 1 , E 2 . . ., E n and let the exponents of 
the diagonal terms of A" 1 be Fj, F 2 . . . , F . 

Then if 


L 


MAX 

i 


( E i + F i)> 


the number of digits lost in inversion is L, and the number of good digits in single 
precision is given by 8 - L. 


PSEUDOINVERSION 

For poorl> conditioned matrices, many authors have suggested the use of a 
pseudoinverse rather than a conventional inverse. Pseudo inversion has an added 
advantage in that it works even when the matrix to be inverted is singular. 

The pseudoinverse, or generalized inverse, performs the same function for 
singular matrices as an inverse performs for nonsingular ones. The generalized 
inverse A + may be defined for all matrices, including rectangular ones, in terms 
of two postulates. 

Postulate 1: A + AA + = A + . 

Postulate 2: AA + A = A . 


■3 

Morrison, D. D., "How Bad is a Matrix,” Space Technology Laboratories, Inc., Interoffice 
Correspondence, May 19, 1965. 
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Unfortunately, the generalized inverse is not uniquely defined by postulates 1 
and 2. However, there are many ways of adding constraints to insure the 
uniqueness. One such set of constraints defines the Penrose Pseudoinverse 
(see Appendix A), which is the one used in this paper. These added constraints, 
or restrictions, are given by two axioms. 

Axiom 1: (AA + ) T = AA + . 

Axiom 2: (A + A) T - A + A . 

For the Penrose pseudoinverse, we will use the symbol A # . 

The main utility of the Penrose pseudoinverse lies in the following result. 
Let the system of equations be given in the form 


and let 


-***♦ 

Then, for any X , 


AX' = B, 


X 0 = A # B. 


AX 0 “ B'| | 2 < || AX - B 


If for some X the equality holds, then 




a 


Thus, the solution given by the Penrose pseudoinverse is a least squares solu- 
tion. If there exists more than one least squares solution, the Penrose pseudo- 
inverse gives the shortest vector in norm which is a solution. 

In terms of the generalized inverse, Equations (5), (8), (9), (10), (11), and 
(12) may be rewritten as follows: 

x 0 = (A t WA) # A t WY (14) 


l 


p = 

(A T WA) # A T WQWA(A t WA) # 

( 15 ) 


X 0 = (A t A)^ A t y 

( 16 ) 

p 

- (A T A) # A t QA(A t A) # 

( 17 ) 

x o 

- (A t Q~ l A) # A T Q _1 Y 

( 18 ) 

p 

= (A T Q " 1 A) # 

( 19 ) 


COMPUTATIONAL EXPERIMENTS 

To determine whether there are computational advantages in using a pseudo 
inverse, a comparison was made between inverse derived by the Gauss-Jordon 
technique and a pseudcinveree derived with the use of an Andree algorithm. (A 
description of the Andree algorithm appears in Appendices B and C.) 

The symmetric and positive definite test matrix A was defined as follows: 

A = BB t , 


where 


\ B = ( b ij> 

>j 

tj 

I and 



(~1) J (i - 1)1 
(i - j)'- (j - 1 )! 


b. 


ij = 0, i < j 


The inverse matrix A* -1 was given by 


A ' 1 = B t B 


11 

It: 


!1 

i 

j • J 

it! 


I 


1 
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The comparison was performed on a Univac 1108 computer in double precision. 
All elements of A were integers, so no round-off was generated upon their entry 
into the computer. The results were then tabulated and summarized as repre- 
sented in Table 3 and Figures 1 and 2. 

The comparison of the two inverses shows that the Gauss-Jordon inverse 
provides slightly better results than the pseudoinverse. At the point where there 
was a computational loss of rank, the pseudoinverse bears no resemblance to 
the conventional inverse. In general, the pseudoinverse is not a continuous func- 
tion of its elements, and it changes considerably when its rank is reduced. For 
example, consider: 


A 




Then, 


and 


A* = 


A # 



for a / 1 


a - 1 . 


Therefore, rather than compare pseudoinverses to inverses, it is more fruitful 
to compare recoverable solutions to a set of equations. 

The second computational experiment was a comparison between the solutions 
of a least squares polynomial fit to data generated by the equation 

y - 1 + lOx + x 2 . 

The solutions were obtained using the Gauss-Jordon inverse and the Andree 
algorithm pseudoinverse of the normal equations. The results are as follows. 
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Table 3 

Loss of Significant Digits and Symmetry in Matrix Inversion 


n 

Lost 

Digits 

Predicted 

Lost Digits 

Symmetry 

Gauss -Jordon 
Method 

Andree 

Algorithm 

Method 

Gauss -Jordon 
Method 

Andree 

Algorithm 

Method 

4 

1 

1 

2 

15 

16 

5 

2 

2 

3 

14 

16 

6 

3 

2 

3 

13 

16 

7 

3 

4 

4 

12 

16 

8 

4 

5 

5 

11 

16 

9 

5 

6 

6 

10 

16 

10 

6 

7 

7 

10 

16 

11 

7 

8 

8 

9 

16 

12 

8 

9 

9 

7 

16 

.13 

9 

9 

— 

7 

16 

14 

10 

11 

— 

6 

16 

15 

11 

12 

— 

5 

16 

16 

12 

13 

— 

4 

16 

17 

12 

14 

— 

3 

16 

18 

13 

14 

— 

3 

16 

19 

14 

15 

— 

2 

16 

20 

15 

16 

— 

1 

16 


11 















12 


Figure 1. Loss of Significant Decimal Digits in Matrix Inversion 


GAUSS-JORDAN INVERSION 



higure 2. Number of Significant Decimal Digits of Symmetry Versus Dimension 


The Gauss-Jordon fit is slightly better for polynomials up to the tenth order. 
For fits from the eleventh order to the twentieth, which is as far as the investiga- 
tion was carried, the Gauss-Jordon recovery is markedly inferior. It is inter- 
esting to note that from the eleventh order polynomial onward the computational 
rank of the pseudoinverse is consistently less than maximal. Table 4 and Fig- 
ure 3 list and chart the errors in the recovery for both procedures, 

The loss of rank in the Andree solution may be caused by the loss of positive 
definiteness in the formation of the matrix of the normal equations, or it could 
be due directly to computational errors in the inversion algorithm. To investi- 
gate which of these is the source of trouble, a second pseudoinverse algorithm, 
based upon a Gram-Schmidt orthogonalizatiori procedure, was investigated. 
(Appendices B and D provide a description of the Gram-Schmidt algorithm.) The 
Gram-Schmidt algorithm computes the pseudoinverse of a matrix directly, with- 
out the need of forming the normal equations. However, it can be applied to the 
normal equations to enable comparison with other procedures . 

When applied to the rectangular matrix, the Gram-Schmidt pseudoinverse 
was found to be markedly superior to both the Gauss-Jordon inverse and the 
Andree algorithm pseudoinverse. (See Table 4.) In fact it could do as well in 
single precision as the other two inverses could do in double precision. When 
the Gram-Schmidt method was applied to the normal equations, it was found to 
be inferior to the Andree algorithm. Thus, it appears that the formation of the 
normal equations is extremely critical in least squares solutions. 

One of the main advantages in the pseudoinversion routines is the manner in 
which the computational rank is controlled. In the Andree algorithm, a series of 
congruent transformations is made. In each step, the reduction is forced to pivot 
about the largest diagonal element. If at any step of the process the largest re- 
maining diagonal element is less than the product of a preassigned constant, K , 
and the first pivotal element, then it and all remaining diagonal elements are set 
to zero. Thus, the rank is depressed. The constant K is chosen to provide n 
significant figures in the ratio between the pivotal element under consideration 
and the largest pivotal element. Even when K is set to zero, the rank is reduced 
if any diagonal element becomes negative. 

In the Gram-Schmidt orthogonalization process, the rank is reduced by com- 
paring the size of the angle between any column vector and its projection on the 
independent column space already determined. This procedure does not preserve 
the non-negativeness of the matrix when applied to the normal equations. 

A comparison was made in which the rank controlling factor was varied. 

The constant K was chosen as 
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Eighth Order Fit Eleventh Order Fit I Sixteenth Order Fit 

<a n -b n ) ( a n ~ b n ) (a n "b o ) 
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Figure 3. Error in Solution Vector Versus Order of Fit 


16 





K - 2~ 54 • 10 n , 

and n was varied from 0 to 15. For n = 0, the rank of the matrix of normal 
equations was depressed by two in the Angree algroithm, while the Gram -Schmidt 
method gave maximal rank. This is due probably to the fact that non-negativeness 
of the matrix is forced as a constraint in the Angree algorithm. As n was in- 
creased, the computational rank was decreased. 

As the rank is decreased, the recovery is improved and then worsened. 

(See Figure 4.) The initial improvement is due to the cancellation of computa- 
tional errors. The subsequent worsening is probably due to the rejection of 
observable information. 

In Figure 5 the computational ,ank is plotted against the parameter n . In 
Figure 6 the norm of the error Li the recovery vector is plotted versus the 
parameter n. Observe that the recovery given by the Andree algorithm corre- 
sponding to a given rank is independent of n , while the recovery in the Gram- 
Schmidt process is different even though the rank stays fixed. This is due to the 
manner in which the Gram -Schmidt process was implemented. Procedures are 
available to insure the uniqueness of the recovery for a given rank- 

Observe from Figures 5 and 6 that for n = 0, 1, 2, and 3 the two Gram- 
Schmidt recoveries are of maximum rank, but the ratio of error vectors is 10 8 . 
This error difference can be due only to the formation of the normal equations. 


CONCLUSIONS AND RECOMMENDATIONS 

On the basis of the preliminary studies described in this report, practical 
solutions to least squares problems are somewhat in doubt due to the poor con- 
ditioning of the associated matrices. Even though these results were demon- 
strated for the impractical problem of high degree polynomial fitting, the con- 
clusions are valid for many other problems. For example, recovery of station 
parameters based upon a regression analysis from one or two satellite passes 
leads to matrices which are in many cases even more poorly conditioned. 

The results of this investigation lead to the conclusion that the formation of 
the normal equations is extremely critical and should be avoided where possible. 
Unfortunately, due to the large number of data points which must be processed 
in practical problems, this leads to very large matrices with the incumbent 
storage limitations of modern computers. Thus, in practice, one is forced to 
treat the normal system of equations. Here, the pseudoinverse of Penrose leads 
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to solutions of greater confidence. Inherent in the use of pseudoinversion 
routines is that an indication of computational difficulty is provided. This is 
evident in the reduction of computational rank from maximal. 


APPENDIX A 


PENROSE PSEUDOINVERSE 


Let A ff be defined as the Penrose pseudoinverse of A, where A n satisfies the 
four axioms: 

Axiom 1: AA # A = A, 

Axiom 2: A # AA # = A # . 

Axiom 3: (AA # ) T = A #t A t = AA # . 

Axiom 4: (A # A) T = A T A #T = A # A, 

We will show that the Penrose pseudoinverse always exists and is unique. 

If the matrix A is non-singular, then the pseudoinverse is identical to the inverse. 

THEOREM A -l: If A -1 exists, then A # = A - * 1 . 


Proof: 


AA n A = A, 


A- 1 AA # AA" 1 = (A _1 A) A # (AA** 1 ) = A^AA" 1 , 


I A # I = A" 1 ! = lA" 1 = A- 1 , 


A # = A' 1 . 


THEOREM A -2: A # is unique. 

Proof: Assume X and Y are Penrose pseudoinverses of A. Then, both X 
and Y satisfy axioms 1 through 4. Thus: 


= XAX = (X A) X = (XA) t X = A t X t X = (A T Y T A T )X T X 


X 


= (A t Y t )A t X t X = (YA) t A t X t X = YAA t X t X = YA(A T X T ) X 
s YA(XA) t X = YA(XAX) = YAX = Y(AX) = Y(AX) T = YX T A T 
= YX t A t Y t A t = Y(X t A t ) (Y t A t ) = Y(AX) t (AY) t 
= YAXAY = Y(AXA)Y = YAY = Y. 

THEOREM A-3: A* T = A T# . 

Proof: Since (A T ) # is the unique Penrose pseudoinverse of A T , it is sufficient 
to show that A #T satisfies axioms 1 through 4: 

(1) AA # A = A. 

Transposing, 

A 1 A #t A t = A t . 

(2) A # AA # = A # . 

Transposing, 

a #t a t a #t = a # t . 

(3) (AA # ) t = A A* = A #t A. 

Thus, 

(A #X A T ) T = (AA*) T = A A* = A* t A t . 

(4) (A # A) T = A t A# t = A* A. 

Thus, 

(A t A #t ) t = (A # A) t = A # A = A t A # t . 


A-2 


THEOREM A -4: 


(A # ) # = A, 

Proof: Since A # is the pseudoinverse of A, it follows that A is the pseudo- 
inverse of (A*), from the symmetry of the axioms. By definition (A # ) # is the 
pseudoinverse of A # , thus ( A **)** = A by theorem A -2. 

THEOREM A-5: A# exists. 

Proof X: Let A be a diagonal matrix: 

A = diag(i\. 1 , k 2 , .... \ n ) . 

Define A # to be the diagonal matrix: 

A# = diag (fi v /i 2 , /x n ), 

where 

Mi = lA it k L 4 0 

Mi = 0, \ = 0 . 

We now show that A # satisfies axioms 1 and 2: 

(1) AA # A = diag (^. ) diagOu. ) diag(\. ) = diag ( A .. 2 /i. ) = diag(\.) = A. 

(2) A # AA # = diag(/x i ) diag(\ i ) diag^. ) = diagOu. 2 \.) = diag(> : ) = A # . 

Since the product of diagonal matrices is diagonal, symmetry is preserved, thus 
satisfying axioms 3 and 4. 

Proof 2: Let A be symmetric; that is, let A = A T . Then, A has the repre- 
sentation 

A = S t DS, 


A -3 


where D is a diagonal matrix [D = diag (A^)] and S is orthogonal, that is S T = 
S“ l . Define A u as: 

A* a S T D«S, 

We now show that A u satisfies axioms 1 through 4: 

(1) AA # A = S T DSS T D*SS T DS # = S T DD # DS = S T DS = A. 

(2) A U AA U = S T D ft SS T DSS T D # S = S T D#DD # S ~ S T D^S - A # , 

(3) (AA#) T = (S t DSS t D # S) t = (S t DD^S) t = S t (DD") t S = S X DB # S 

= S T DSS T D*S = AA U . 

(4) (A # A) T s (S T D*SS T DS) T = (S t D # DS) t = S T (D*D) T S ~ S T D*DS 

= S T D*SS T DS = A # A. 

Proof 3: Let A be arbitrary, then we define A # as either 

A* = (A X A)# A t 
or 

A # = A t (AA t ) # , 

i 

A 

depending upon which resulting symmetric matrix, A T A or AA T , has the smaller 
dimension. 

We now show that A # satisfies axioms 1 through 4: 

(1) AA # A = A(A r A) # A T A 


A~4 



Let 


C a A(A t A) w A t A - A, 

* 


then C T is given by 


C T = A t A(A t A)#A t - A T 


and C T C is given by 

C T C = fA T A(A T A) # A T - A T j j^A(A T A)# A T A - A 
= A t A(A t A) # A t A(A t A) # A t A - A T A(A T A)» A T A 


- A t A(A t A) # a t a + a t a 


= A T A - A T A - A T A + A T A 


= 0 . 

Thus, C = 0, or 


A(A T A)#A T A = A. 


(2) 

A # AA # = 

(A T A) # A t A(A t A) # A t = (A t A) # A t - A # . 

(3) 

(A#A) T = 

f(A T A)# A t a] t = 

B # B 

> • 

T = B # B = (A t A) # A t A 

(4) 

(AA # ) t = 

[a(A t A)# A t ] t = A t (A t A) ,,t A = A T (A T A) T# A 


= A T (A T A) # A = AA # . 


A -5 


A* A, 


The main utility of the Penrose pseudoinverse is contained in the following result. 
Let the system of equations 


AX - B 

be given, and lot 

X Q * A»B, 


Then, for any X, 

I |AX 0 - 5| | < 1 1 AX - b| i . 
If for some X the equality holds, then 

ilx 0 I! < ||x||. 


Thus, the solution given by the Penrose pseudoinverse is the least squares solu- 
tion, If there is more than one least squares solution, the solution given by the 
Penrose pseudoinverse is the smallest solution in norm. 


THEOREM A-6 (Projection Lemma): 


i | h “"*11 Wi i. IN j { | 

||ax 0 -bI! = ^ Iiax-b] 

if and only if 

X t A t (AX 0 - B) = 0 

for all X, 

Proof: Assume that 


X t A t (AX 0 -B) = 


A-6 


0 


fox' all X , and let Y be given by 


Y = X 0 + Z. 

* 


We then have 

1 1 AY - B| | 2 « 1 1 AX 0 - B + AZ| | 2 = ||AX 0 -1|| 2 + 2ZTA^AX 0 -B) + ||AZ|| 2 , 


since by assumption 


Z t A t (AX 0 -B) = 0. 


Then, 


) | AY - 51 1 2 = 1 |AX 0 -IIS 2 + 1 1 AZ‘| | 2 . 

Thus, 

1 1 AY ~ B| | 2 < || AX 0 - 5| | 2 

since 



> 0 . 


We now assume that 


it—* ’“*11 | I h — ♦ 

I |AX 0 - b| I < I I AY “ B 
and wish to show that this implies that 

X t A t (AX 0 -B) = 0 

for all X. Assume that 
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X T A T (AX 0 - B) = a t 0 


for some 

Then, 

1 1 AY- ■ 


which is a 
THEOREM 

where 

and 


l. Let V and Y be defined as 


V = 


-aX 


I |AX| I 


2 > 


Y = X Q + V. 


B 2 - AX, - B 1 2 


|AX ° ' B " | j AX 1 1 


II 2 - 1 1 AX„ - 5| 1 2 


2aX T A t (AX 0 - B) IIaJU* 

+ OT 


1 1 ax) | 


AX 


~2a 2 + a 2 


-a 2 


AX 2 AX 1 2 


I I Ax) | 


2 » 


contradiction. 


A-7; In the system of equations 


AX = B, 


x 0 = A # B 


X /' X„ , 
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either 


1 )AX - B| | > | |AX 0 - B] | 


or 


| AX - B| I = I |AX 0 ~ b| I and I |x| I > I |x 0 | . 


Proof: First we must show that 


XTA T (AX 0 -B) = 0 

as follows: 


X T A t (AA # B - B) = X T A T AA#B - X T A T B 


= X T A t (AA # )B - x t a t b 


= x T A T (AA # ) T B - x t a t b 


= x t a t a# t a t b - x t a t b 


= X T A T B - X T A T B 


= 0 . 


Thus, from theorem A--6, 


— > — * — ♦ — ► 

| | AX - B-| | > I |AX 0 -b!(. 


.assume the equality holds: 

| | AX- - B| | = | |AX 0 - B'| | . 
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then, by hypothesis, 


Max- b| | 2 - llAXj, - b| | 2 = 2y t a t (ax 0 - 1) + | |ay| | 2 

= 1 1 ay] I 2 = 0. 

Now, 

||x 0 || 2 = fix - (X - x 0 ) 1 1 2 = I |x| | 2 + 1 1 X — X 0 1 | 2 — 2X T (X - X' 0 ) 

= I |x| | 2 + ||Y|| 2 -2(X 0+ Y) T Y- 
= I |x| I 2 + I |Y| | 2 - 2 B t A #t Y - 2 1 1 y| 

= | |X| ) 2 — 1 1 Y-| | 2 - 2B' t A #t Y' 

= I |x-| I 2 - | |v| | 2 - 2 B t A #t A t A #t Y 

= | |X| | 2 - | |y| I 2 - 2B t A #t (A # A) t Y 

= | |X| [ 2 - IlY-ir - 2 B t A# t A t (AY). 

Since | |AY| | = 0, it follows that 




APPENDIX B 


PSEUDOINVERSE ALGORITHMS 


There exist many algorithms for the generation of the pseudoinverse of 
Penrose. Unfortunately, not much information is available in terms of their 
efficiency and computer requirements. Such a comparison is presently being 
made and will be the subject of a future report. In the interim, we will present 
two such algorithms which have been used extensively by the author. These 
algorithms possess the desirable property of allowing a limited control of the 
computational rank. 

In the first method, the pseudoinverse is formed by means of the Andree 
algorithm. This routine computes the pseudoinverse by means of the equation 

A# = (A t A) # A t 


Actually, the pseudo inverse of 


B = (A t A) 


is formed. This routine insures the symmetry of B # and also imposes the re- 
quirement that B # be non -negative . Thus, if computational errors leading to the 
loss of the positive definiteness of B exist, they are partially alleviated by this 
scheme . 

The second algorithm, which is basically a Gram -Schmidt orthogonalization 
procedure, can operate directly on the rectangular matrix A. There is no need 
to form the matrix associated with the normal equations, thus one major source 
of computational errors is eliminated. The penalty paid for this lies in the larger 
computer storage required to handle the rectangular matrix. 


ANDREE ALGORITHM 

The algorithm used in this paper is a modification of the Andree algorithm 
by T. S. Englar 1 . The steps of this algorithm are as fallows: 

^Kalman, R. D. arid Englar, T. S., "An Automatic Synthesis Program for Optical Filters and Control 
Systems/' NASA, July 1963 . 



1. Compute A T A or AA T , whichever has the smallest dimension. Call this 
resulting matrix B. It will be sufficient to compute B u , since A# is 
given by 


A # = (A T A) # A T orA # = A T (AA T )^. 

2. Compute a non-singular matrix, S = (s. .), such that 


SBS t = E, 


where E = (e. . ) is a diagonal matrix with elements either zero or one. 
J * 

3. If E = I then B is invertible, and 

B' 1 = S T S. 


If E + I 

then we define the matrix U = (m .) by the following: 

For i ^ 

j » u. , 

j i i j 

- “ s. , 

i } 

if e, . 

= o, 


1 1 



= 0 

if eii 

= 1 . 

For i = 

i , u. . 

= 0 

if e;. 

= 0, 



= 1 

if e. . 

1 1 

= 1 . 


4. Compute 


C = U t BU. 


Delete the rows and columns of C corresponding to C. . ~ 0, and call 
the resulting matrix D. Observe that D is a non-singular matrix of 
rank m. Compute D' 1 by means of the Andree algorithm as in step 2. 
Compute B # by means of 


B-2 


0 


B # 



0 


U T . 


5. Compute A n either by means of 


A # “ B#A T 
or 

A # = A t B^ . 


In the computation of step 2, the generation of the matrix S is done in 
at most 2n steps. The reduction is based upon pivoting in each step 
about the largest of the remaining diagonal elements. If after any step 
of iteration the largest of the remaining diagonal elements is less than 
the product of a preassigned constant, K , and the first pivotal element, 
then the rows and columns containing these elements are set to zero 
thus reducing the rank of the matrix. 


GRAM -SCHMIDT PROCEDURE 

This algorithm 2 permits one to pseudoinvert a rectangular matrix, A, 
directly. The algorithm is based upon partitioning A in the form 

A = (R , RU), 

where all columns of R are linearly independent. The pseudoinverse A # is given 

by 

/ (I + UU T )- 1 R # 

A # - 

\ U T (I + UU 1 )- 1 R # 

2 

Rust, B., Burrus, W. R. and Schneeberger, C., "A Simple Algorithm for computing the Generalized 
Inverse of a Matrix,” Communications of the ACM, Vol. 9, No. 5, May 1966, pp 381-387. 
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This representation may be checked by substitution into the axioms. 

If (a 1} a 2 , . . , a n ) is any set of linearly independent vectors, then it can be 
replaced by an orthonormal set (q r , q 2 , ... q n ) in the following manner: 


*1 




a 2 " ( a 2 


*2 



% 



% 



C 3 ~ a 3 " ( a 3 1l) <ll " ( a 3 ‘J2 


c- - a 

n 


" " £ ( 3 n T ^i) ; 5i • 

i = 1 


Observe that the resulting matrix Q is such that 


Q T Q = I. 


Let us apply this transformation to A = (R, RU) and keep track of the trans- 
formation by applying it simultaneously to the identity matrix partitioned as 



B-4 


A will transform into 


A = (R , RU) ► (Q , 0) 


h 0 


Z X 


0 I 


n -k 


0 I 


n ~ki 


Thus, we must have 


(R , RU) 


0 


= (RZ , RX + RU) = (Q , 0) 


’n -k 


or 


RZ = Q 


R = QZ 


-l 


RX + RU = 0 


X = ~U 


and R # is given by 


R # = Z Q 1 


since 


R # = (R t R)” 1 R t = (Z-^QTqz* 1 ) Z“ iT Q t = ZQ t . 


Thus, the matrices R # and U are generated. It is still necessary to compute the 
matrices (I + UU 7 )" 1 and U T (I + UU 7 )" 1 . The first of these expressions can be 
written as 


(I + UU 7 )- 1 "I - U(U 7 U + I^U 7 . 


If both sides are post-multiplied by (I t UU t ), the identity follows. The second 
expression can be written as 

U T (I + UU 1 )- 1 = (U T U + 1)- 1 U T , 

since 

U T (I+UU T )~ l = U T - U T U(U T U + I)“ l U T 
= - U T U(U T U + I)- 1 U T 

- + (U T U + I)- 1 - (U T U + I)-* 1 - U T U(U T U + I)- 1 j U T 

- + (U T U + I)' 1 *- (I + U T U) (I + U T 

= + (U T U + I)- 1 - i ju T 

- (U T U + I)“ 1 U T . 

If the Gram-Schmidt orthogonali^ ition process is now applied to the matrix 



; ? 

I 

this will transform to j f 


{ 5 . 
> 
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where we have 


'-UP V /-UP 


\ P 


P T U T UP + P T P ~ I , 


or this becomes 


Also, 


and 


P T (U T U + I)P = I 


(U T U + I) s pT-ip-i 


(U T U + I)' 1 = PP T . 


I - U(U T U + I)- l U T = I-UPP T U T 


= I - (UP) (UP ) 1 


(U T U + I)' 1 ^ = P(UP) T . 


Thus, A # takes the form 


A # = 


I - (UP) (UP) T j R # 
P(UP) T R # 
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APPENDIX 0 

ANDREE ALGORITHM SUBROUTINE 


The Andree algorithm for calculating the Penrose pseudoinverse of a matrix 
of equations has been converted into a FORTRAN subroutine. The listing for this 
subroutine appears on the following pages. 


i*N IV w U VLl l* MCU J 


ANDwr r 


')A 1 t. 


6*<l *4 


*/ •*. /23 


WliHUT Iht ANDUf E( V.N .N^.LPb) 

UUUMLf PK'.CIblWN V,t'*T ,*,X ,U.P,l«N,«bOPT 

0 1 MENS t UN V( JO, .JO)fl'<30fJO)«M(3C)*r(.SO»J') I'll ,X< »c* V") 

DIMENSION U{ JO, 10) 

‘JILL s enfc 
NfJ * f>(. 

INC * 0 
U*’H * 0 

00 1,13 I * l,N 
DO 333 J * I. N 
333 H( I . J) * V( I , J) 

«.6r 01) 90 I » l,N 

P< l ) s o. 

DO 97 J * 1 • N 
<3 7 T ( 1 , 3 ) * C . 

9 m r < i • i > * i . 

3b LEI, as L F f ♦ 1 

IF (LFF-M 77,77,70 
77 P * 0. 

K 2 Q 

OfJ 23 I * I ,N 
IF <«(!)) 22.124,2 ? 

124 IF (7(1,1)) 02,02,03 
02 DU 64 J * 1,N 

V< I , J ) = C . 

64 V< J, I ) b 0. 

00 TO 22 

63 IF (P-V( 1,1)) 21 .22*22 
2 1 P * V < I » 1 ) 

K = I 

IF (Lfc.0 ,F 0 * I ) DM AX * P 
42 CONI INUF 

IF (P-lO.**0U.t,>DMAX/2,**NB) 26,20.7 
7 R < K ) « K 

P sr DSOOTI V ( K , K ) ) 

DO 19 I s t.N 
V ( I »K I = V( I ,K)/P 
T(I.K) = 7 ( I , K ) / P 

V { K , I ) b V(K. I )/P 
19 T ( K , I ) = T(K.I)/P 

V { K , K ) = I . 

T(K.K) = l./P 
DO 25 1 ~ 1 , N 

IF < I — K > 26,125.26 

125 DU 126 3 2 1 , N 

IF (1-J) 127,120,147 

127 W ( I « J ) = 0. 

X< I , J) s T( I , J) 

GO TQ 126 
120 M ( I , I ) s 1 , 

X( I, 1 ) = T< I . I ) 

126 CONTINUE 
GC TO 25 

26 DU 1C J = 1 ,N 

W(I,J) = V( I , J)-V< I ,K )*V(K,J) 

X(I,J) = T(I,J)-V<I,M*T(K,J) 



02 


IV u LEVEL I . MOO l 


ANDRCE 


DATE = 631 UO 


00/2 j/23 


1 3 CONTI NUF; 

2 5 CONTINUE 

DO 24 I = l * N 
DO 24 U s I • N 
V ( T » J ) = w< I . J) 

24 T(l.J> = X< I . J> 

60 TO 35 

2« DO 3 0 I s 1 * N 

IF (R( I > > 30.34,3 0 
34 UO 33 J = I.N 

V ( l » U > = C. 

3 1 V ( J • I ) - C. 

30 CONTINUE 

70 BILL = 0. 

IF ( I ND ) 33.39.3P 
3V JUE = N 

DO 40 I = I.N 
IF ( V ( I . I ) ) 40, 42,40 

42 JUE ~ JOF-l 
40 CONTINUE 

IF (JOE-N) 43,44,43 
44 DO 45 I = I.N 
DO 45 11 = 1 ,N 

RN = 0. 

DO 46 J o I.N 
46 RR = RWFT( J, I )*T< J. I I 

V ( I , I I > = RR 
4t, CONTINUE 

GO TO 36 

4 3 DO 4 7 I - 1 , N 

IF ( R ( I ) ) 46,49.46 

49 00 60 J = I.N 

60 U< I , J) = - T ( I . J> 

U< 1 , I ) ~ C . 

GO TO 47 

46 DO 61 J - I.N 

61 U( I , J) = 0. 

U< I , I ) = 1 . 

4/ CONTINUE 

DO 5 C I = I.N ' 

DO 50 II = I.N 
RR = 0. 

DU 61 J = I.N 

51 HR = NHFUI J . I I *13 ( J . I I) 
W< I . I I > = RR 

50 CUNTINUE 

DO 5? I = I.N 
DO 5? II = I.N 
«K = 0. 

DO 53 J = I.N 
53 i<R = RR + W< I . J)*U( J«! I > 
V ( I , I I > = RR 

52 CONTINUE 
IND = l 
LEE = 0 
GO TO 666 
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IV 0 LEVEL 

1 * 

MCD 1 ANDREE 

38 

00 

54 I = 1 * N 


DO 

54 It - 1 • N 


MR 

= 0. 


DO 

55 J = 1 . N 

55 

KH 

= RWf T( J. 1 )*T( J. l 1 ) 


* ( I 

.11) = RR 

54 

CON 1 INUP 


00 

60 I = !,N 


00 

66 1 ! * l ,N 


UK 

= 0 ■ 


00 

57 J - i » N 

57 

i< H 

= RR + U( l , J )*W< J, l l ) 

5o 

T ( I 

•II) = KW 


00 

58 I = 1 ,N 


DO 

58 II = t.N 


KW 

= 0. 


DO 

59 J = l.N 

59 

Rt< 

= WHfT(I.JM‘U( 11. J) 


VI I 

.11) “ RW 

58 

CONTINUE 

36 

NiV 

= JOE 


WET 

OWN 


END 



DATE = 60180 


09/20/23 
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APPENDIX D 


GRAM-SCHMIDT PSEUDOINVERSION SUBROUTINE 


The Gram -Schmidt procedure for calculating the Penrose pseudoinverse of 
a matrix of equations has been converted into a FORTRAN subroutine. The 
listing for this subroutine appears on the following pages. 


an IV VJ Lf Vf.L 1, MOO 


G I NV2 


') A TP = 6H1H0 


SUtlNOL-r INf C.INV?( A.O. AfL AG, ATfcMP , MP , NR , NC , NM I , LP 3 1 
OOUHLL PWCCISUIN A( ,NC ) .U(NC iNC) . AFLACJt NC) ,ATFMP( NC I 
OUUPLt PWtCISlON f AL ,CCT ,D(JT l ,nr.'T2 .TOL.DSCRT 
00 1C I - 1 • NC 

n b j = i , nc 
b U ( l . J ) = C . 

1C U ( I . I ) = 1 . 

P AC =• DOT ( MW , NK , A , 1 , i ) 

(•AC = l .A,S iPTCFACl 
00 tb I = 1| NR 

lb A ( I . I 1 - A { 1 » 1 ) *FAC 
Jl) ^ 1 = 1 .NC 

23 U(l.t) = 0(1, l ) *FAC‘ 

AF L AO ( 11 = 1 . 

N = 5 O 

NW 1 = NC 

TUL = { 1 0 . **LPS* .5 **N ) +*2 

00 IOC J = F.NC 

0UT1 = DLT { MR ,NF , A . J. J ) 

JMl = J-| 

DO 5 0 L = 1 , a 
00 JO K = 1.JV1 

JO ATFMPIK) = 0(JT< MW ,NP, A . J ,K ) 

DO A 5 X s 1 i JM1 
DO 35 t = I ,NR 

35 A(I.J) = A ( l , J ) -ATFMP1 K > *A ( I , K >*AFLAG(K ) 

DU 40 I = 1 • NC 

4 0 U(I.J) U< I . J ) — A T F MP( K) *U< I ,K > 

4 b continue: 

SO COM I NUF 

1)11? r |>L1 (MR, NR, A, J, J) 

If ( ( DlOTC/DUT 1 1 - rOL 1 bb.bb.7C 
bb DU 60 { - l.JMl 

ATfNP(t) = 0. 

DO 6 C «, = 1,1 

oO ATLMP(I) = A TFMP( I ) fU ( K , 1 ) +U ( K, J ) 

DO nb I ~ l.NW 
A ( I , J > = 0 . 

DO 65 K =• l.JMl 

6b A(l.J) = A( 1 . J ) — A ( I » K ) * A T F MP ( K ) ♦ AF L A L( K ) 

AFIAG(J) = 0. 

FAC = DCTCNC.NC.O, J.J ) 

F AC - 1 ,/OSOPT ( F- AC ) 

NPl = NHl-1 
G'J TO 7b 

7 C AFLAG ( J ) = 1 . 

TAC = l ,/0 SORT (DOT a) 

73 DJ 6C 1 = l.NW 

SO A ( I , J > = A < 1 » J ) *F AC 

DO B 5 I = 1 , NC 
as O ( 1 . J ) = L ( I , J 1 * F A c 
1 0 0 CONTINUE: 

DO 1 3C J = 1 . NC 

DO l JO » = I, NR 

FAC = 0. 

DU 1*0 K = J.NC 


00/20/23 


h ‘ 


D-2 








»N [V U ULVE.L It MUD l LUT DATti ss bB'lOO 

OUUJLh PldCISIDN FUNCTI'l.s OUT < Mf> t NR t A t J t K > 

OilUHLL PHtCISlON A(VM,|),X 

x * C .no 

UU 5C I = I .NN 
X a X ♦ A ( | » J I * A ( J » K ) 

50 CONTINUE 

onr = x 

WETUKh 

END 
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